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Abstract. - We consider the conditional galaxy density around each galaxy, and study its fluc- 
tuations in the newest samples of the Sloan Digital Sky Survey Data Release 7. Over a large 
range of scales, both the average conditional density and its variance show a nontrivial scaling 
behavior, which resembles to criticality. The density depends, for 10 < r < 80 Mpc/h, only weakly 

(logarithmically) on the system size. Correspondingly, wc find that the density fluctuations follow 
the Gumbel distribution of extreme value statistics. This distribution is clearly distinguishable 
from a Gaussian distribution, which would arise for a homogeneous spatial galaxy configuration. 
We also point out similarities between the galajcy distribution and critical systems of statistical 
physics. 
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Introduction. — One of the cornerstones of modern 
cosmology is the mapping of three dimensional galaxy 
distributions. In the last decade two extensive projects, 
the Sloan Digital Sky Survey (SDSS — [1]) and the Two 
degree Field Galaxy Redshift Survey (2dFGRS — [2]), 
have provided redshifts of an unprecedented quality for 
more than one million galaxies. A common feature ob- 
served in these surveys [3,4] is that galaxies are organized 
in a complex pattern, characterized by large scale struc- 
tures: chisters. STiper-clusters. and filaments with large 
voids of extremely low local density [5] . Recent analyses 
of these catalogs have shown that galaxy structures display 
large amplitude density fluctuations at all scales limited 
only by sample sizes [6-10]. In addition, the conditional 
density [11] has been found to decay with distance as a 
power-law function with an exponent close to one, up to 
~ 30 Mpc/h ^. At larger scales, the situation was unclear 
since in the 2dFGRS the relatively small solid angle pre- 
vents the proper characterization of correlations at larger 
scales [9,10]. Conversely, the SDSS samples (data release 
6 — DR6) clearly show that conditional fluc;tuations are 
not self-averaging for r > 30 Mpc/h. In the latter case, 
the sample volumes were found to be too small to obtain 



iWc use H„ = lOOfe km/sec/Mpc, with OA < h < 0.7, for the 
Hubble's constant. 



statistically stable result due to wild fluctuations [6,7]. 
Therefore, although there are unambiguous evidences for 
the inhomogeneity of the galaxy distribution at least up 
to scales of 100 Mpc/h [6, 7, 9, 10], the scaling properties 
at scales larger than 30 Mpc/h were poorly understood. 

The new galaxy samples from the data release 7 (DR7 
[12]) doubled in size since the DR6 sample. This new 
catalog is large enough to facihtate the study of fluctu- 
ations in the galaxy distribution. In particular, we cal- 
culate the galaxy density in a sphere of radius r around 
each galaxy, i.e., the conditional density. For uniformly 
positioned galaxies [11], the average conditional density 
is independent of the radius r, and the fluctuations over 
galaxies are Gaussian. Conversely, in DR7 we find that 
the average density depends logarithmically on r, while 
the fiuctuations follow the Gumbel distribution of extreme 
value statistics. This behavior has an analog in statistical 
physics, where logarithmically changing averages tend to 
correspond to Gumbel type fluctuations [13]. 

The rest of the paper is organized as follows. We first 
discuss the quantities we consider in the measurements 
and briefly discuss the main properties of the Gumbel dis- 
tribution. We then introduce the galaxy samples and our 
main results on the average, the variance and the fluctu- 
ation distribution of the conditional density we measured 
in the data. Finally we discuss the results and draw con- 
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elusions. 

Statistical methods. — In this section we describe 

the estimators wc use in the analysis and then discuss the 
properties of the Gumbel distribution. We also provide 
some physical examples where the Gumbel distribution 
was found fit to experimental data. 

Estimators and their main properties. A particularly 
useful characterization of statistical properties of point 
distributions can be obtained by measuring conditional 
quantities [11]. In this paper wc focus on such a quan- 
tity, namely we calculate the number N{r) of galaxies 
contained in a sphere of radius r centered on a galaxy. 
Note that not all galaxies can be considered as sphere 
centers for a given radius r: a central galaxy has to be 
farther than distance r from any border of the sample, so 
that the sphere volume is fully contained inside the sample 
volume [7, 10]. As r approaches the radius of the largest 
sphere fully contained in the sample volume, the statistics 
become poorer. To deal with these limitations for large 
values of r, two effects should be taken into account: (i) 
the number of points M{r) satisfying the above condition 
is largely reduced and (ii) most of the points are located 
in the same region of the sample. Any conclusion about 
statistical properties must consider a careful analysis of 
these limitations [7]. 

The Gumbel distribution. The Gumbel (also known 
as Fisher-Tippet-Gumbel) distribution is one of the three 
extreme value distribution [14, 1-5]. It describes the dis- 
tribution of the largest values of a random variable from 
a density function with faster than algebraic (say expo- 
nential) decay. The Gumbel distribution's PDF is given 
by 



P{y) = - exp 



y-a 



exp 



With the scaling variable 



y-a 

/3 



(1) 



(2) 



the density function (Eq.l) simplifies to the parameter- free 
Gumbel 

P{x) = e-^-^~^ (3) 

with (cumulative) distribution " . Note that this dis- 
tribution corresponds to large extremes, while for low ex- 
treme values, X is used instead of —x in the Gumbel dis- 
tribution. 

The mean and the standard deviation (variance) of the 
Gumbel distribution (Eq.l) is 



= a + 7/J, CT^ = (/?7r) 2/6 



(4) 



where 7 = 0.5772 ... is the Euler constant. For the scaled 
Gumbel (Eq.3) the first two cumulants of Eq.4 simplify to 
7 and 71^/6. 



Gumbel in critical systems. Away from criticality, any 
global (spatially averaged) observable of a mac;roscopic 
system has Gaussian fluctuations, in agreement with the 
central limit theorem (CLT). At criticality, however, the 
correlation length tends to infinity, and the CLT no longer 
applies. Indeed, fluctuations of global quantities in critical 
systems usually have non-Gaussian fluctuations. The type 
of fluctuations is characteristic to the universality class of 
the system's critical behavior [16,17]. 

To fit experimental data, the generalized Gumbel PDF 
P{x) — (e~^~^ )° has often been used, where a is a real 
parameter. For integer values of a, this distribution cor- 
responds to the 0-th maximal value of a random variable. 
The a = 1 case corresponds to the Gumbel distribution. 
Experimental examples for Gumbel or generalized Gum- 
bel distributions include power consumption of a turbulent 
flow [18], roughness of voltage fluctuations in a resistor 
(original Gumbel a = 1 case) [19], plasma density fluctua- 
tions in a tokamak [20] , orientation fluctuations in a liquid 
crystal [21], and other systems cited in [13]. The Gumbel 
distribution describing fluctuations of a global observable 
was first obtained analytically in [19] for the roughness 
fluctuations of 1// noise. Its relations to extreme value 
statistics have been clarified [22,23], generalizations have 
appeared [24] , and related finite size corrections have been 
understood [25]. 

In a recent paper Bramwell [13] conjectured that only 
three types of distributions appear to describe fluctuations 
of global observables at criticality. In particular, when the 
global observable depends logarithmically on the system 
size, the corresponding distribution should be a (general- 
ized) Gumbel. For example the mean roughness of 1 // 
signals depends on the logarithm of the observation time 
(system size), and the corresponding PDF is indeed the 
Gumbel distribution [19]. 

The Data. — We have constructed several sub- 
samples of the main-galaxy (MG) sample of the spec- 
troscopic catalog SDSS-DR7 2. We have constrained the 
flags indicating the type of object to select only the galax- 
ies from the MG sample. We then consider galaxies in 
the redshift range 10~* < 2; < 0.3 with redshift confi- 
dence Zconf ^ 0.35 and with flags indicating no signiflcant 
redshift determination errors. In addition we apply the 
apparent magnitude filtering condition rur < 17.77 [26]. 
The angular region we consider is limited, in the SDSS 
internal angular coordinates, by —33.5° < 77 < 36.0° and 
-48.0° < A < 51.5°: the resulting solid angle is Q = 1.85 
steradians. We do not use corrections for the redshift com- 
pleteness mask or for fiber collision effects. Fiber collisions 
in general do not present a problem for measurements of 
large scale galaxy correlations [26]. Completeness varies 
most near the current survey edges, which are excluded in 
our samples. In addition the completeness mask could be 
the main source of systematic effects on small scale only, 
while we are interested on the correlation properties on 
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relatively large separations [8]. 

To construct volume-limited (VL) samples we computed 
the metric distances R using the standard cosmological 
parameters, i.e., ^Im = 0.3 and f^A = 0.7. We computed 
absolute magnitudes using Petrosian apparent magni- 
tudes in the rrij. filter corrected for Galactic absorption. 
We considered the sample limited by i? g [70, 450] Mpc/h 
and Mr € [-21.8,-20.8] containing M = 93821 galaxies. 
In this sample there are about 1/5 of the whole galaxies 
in DR7 it has relatively large spatial extensions and small 
spread in galaxy luminosity. Note that in other samples 
limited at scales smaller than ^ 400 Mpc/h we found sim- 
ilar results. 

We have checked that our main results in this VL sam- 
ple do not depend significantly on K-corrections and/or 
evolutionary corrections as those used by [27]. In this pa- 
per we use standard K-correction from the VAGC data 
(see discussion in [7] for more details). 

Results. — In this section we present our findings 
from the analysis of the galaxy data. 

We have computed the number of galaxies Ni(r) within 
radius r around each galaxy i satisfying the boundary con- 
dition previously mentioned. By normalizing it by the 
volume Vr — 47rr'^/3 of the sphere, we obtain the condi- 
tional galaxy density ni{r) = Ni{r)/V around each galaxy 
i. This quantity is our main interest in this paper. The 
variable ni{r) differs for each galaxy, hence we consider 
this local density ni{r) as a random variable, and study 
its statistical properties. For example the conditional av- 
erage density within radius r is defined as 

^ M{t) 

M{r) ^ 

where n{r) is "conditioned" on the presence of the cen- 
tral galaxy. The simplest quantity to characterize density 
fluctuations is the variance, or mean square deviation at 
scale r, which is defined as 

a^{r) = var [n{r)] = ^ n^{r) - n{r) . (6) 

In the following subsections we are going to study the 
whole distribution of ni(r) as well. 

Self- averaging properties. Conditional fluctuations 
have been found to be not self-averaging in several SDSS- 
DR6 samples, i.e., there were systematic differences be- 
tween statistical properties measured in different parts of 
a given sample [6,9]. It was concluded that this behavior 
is due to galaxy density fluctuations which are too large in 
amplitude and too extended in space to be self-averaging 
inside the considered volumes. The lack of self averaging 
prevents one to extract a statistically meaningful informa- 
tion from whole sample average quantities, as for example 

■^http : //sdss .physics .nyu. edu/vagc/ 




Fig. 1: PDFs of the conditional density in spheres of radius 
r = 30 Mpc/h (left) and r = 80 Mpc/h (right), in two distinct 
regions: a nearby (^i) and a faraway (5*2) one. Notice that the 
two PDFs statistically give the same signal. 

the conditional average density. We repeated the stabil- 
ity test of statistical quantities within the new SDSS-DR7 
sample, since it almost doubled in size compared to SDSS- 
DR6. To this aim we cut the sample volume into two 
regions, a nearby and a faraway one as in [6,7], and we 
determine the PDFs P(n(r)) = P{n; r) of the conditional 
density separately in both regions, and at two different r 
scales. We conclude from Fig.l that the PDF is statisti- 
cally stable and does not show systematic dependence on 
system size, as opposed to the case of the SDSS-DR6 data 
on scales r > 30 Mpc/h [6,7]. Hence in this new sample, 
conditional statistical quantities computed over the whole 
sample volume are useful and meaningful indicators. 

Scaling at small scales. At small length scales (r < 20 
Mpc/h) the exponent for the conditional average density 
is close to minus one (see Fig. 2). This result is in agree- 
ment with ones obtained by the same method in a number 
of different samples (see [6,7,9,10] and references therein). 
This scaling can be interpreted as a signature of fractality 
of the galaxy distribution in this range of scale. In addi- 
tion, this implies that the distribution is not uniform at 
these scales, and thus the standard two-point correlation 
function is substantially biased. 

Scaling at large scales. We first computed the average 
conditional density (Eq.5) at large scales (r > 10 Mpc/h ). 
For a uniform point distribution this quantity is constant, 
i.e., independent of the radius r [11]. Conversely, in our 
data we find a pronounced r dependence, as can be seen 
in Fig. 2. Our best fit is 



that is the average density depends only weakly (logarith- 
mically) on r. Alternatively, an almost indistinguishable 
power-law fit is provided by 

0.011 X r"°-2^ . (8) 
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Fig. 2: Conditional average density n{r) of galaxies as a func- 
tion of radius. In the inset panel the same is shown in the full 
range of scales. Note the change of slope at ~ 20 Mpc/h and 
also the lack of flattening up to ~ 80 Mpc/h. Our conjecture is 
that we have a logarithmic correction to the constant behavior, 
although we cannot exclude the possibility that it is power law 
with an exponent ~ —0.3. 



We emphasize our preference for the logarithmic fit, where 
the only fitting parameter is the amplitude. As can be 
seen in Fig. 2, we find a change of slope in the conditional 
average density in terms of the radius r at about « 20 
Mpc/h. At this point the decay of the density changes 
from an inverse linear decay to a slow logarithmic one. 
Moreover, the density n(r) does not saturate to a constant 
up to ~ 80 Mpc/h, i.e., up to the largest scales probed in 
this sample. Note that up to r = 80 Mpc/h the number 
of points M(r) is larger than 10"*, making this statistics 
very robust. 

This result is in agreement with a study of the SDSS- 
DR4 samples [28], where, in the average conditional den- 
sity, a similar change of slope was observed at about the 
same scale r w 20 Mpc/h, together with quite large sample 
to sample fluctuations. Indeed, some evidences were sub- 
sequently found to support that the galaxy distribution is 
still characterized by rather large fluctuations up to 100 
Mpc/h, making it incompatible with uniformity [6-10]. 
Similarly, in the Luminous Red Galaxy (LRG) sample of 
SDSS, Hogg et al. [29] also found a slope change in the 
average conditional density. On the other hand, we do 
not observe a transition to uniformity at about 70 Mpc/h, 
which they reported. Note also that a study of the self- 
averaging properties of fluctuations in the LRG sample is 
still lacking. 

Compared to the average density, it is harder to find the 
correct fit for the variance cr^(r) of the conditional density 
(Eq.6). Our best fit is (see Fig. 3) 



cr2(r) w 0.007 X r~ 



(9) 



Fig. 3: Variance of the conditional density ni{r) as a func- 
tion of the radius. Conversely, the corresponding variance of a 
Poisson point process would display a 1/r'^ decay. 



terized by non-trivial correlations for scales up to r « 80 
Mpc/h. 

To probe the whole distribution of the conditional den- 
sity ni{r), we fitted the Gumbel distribution (Eq.l) via its 
two parameters a and f3. One of our best fits is obtained 
for r = 20 Mpc/h, see Fig. 4. The data, moreover, con- 
vincingly collapses to the parameter-less Gumbel distribu- 
tion (Eq.3) for all values of r for 10 < r < 80 Mpc/h, with 
the use of the scaling variable x from Eq.2 (see Figs. 5-6). 
Note that for a Poisson point process (uncorrelated ran- 
dom points) the number N(r) (and consequently also the 
density) fiuctuations are distributed exactly according to 
a Poisson distribution, which in turn converges to a Gaus- 
sian distribution for large average number of points N(r) 
per sphere. In our samples, N{r) is always larger than 20 
galaxies, where the Poisson and the Gaussian PDFs differ 
less than the uncertainty in our data. Note also that due 
to the central limit theorem, all homogeneous point dis- 
tributions (not only the Poisson process) lead to Gaussian 
fluctuations. Hence the appearance of the Gumbel dis- 
tribution is a clear sign of inhomogeneity and large scale 
structures in our samples. 

The fltting parameters in Eq.l varied with the radius r 
approximately as 



0.007 



0.035 



(10) 



Given the scaling behavior of the conditional density and 
variance, we conclude that galaxy structures are charac- 



although a logarithmic flt a w 0.0115/ log r cannot be 
excluded either. With the fltted values of a and (3 we re- 
cover the (directly measured) average conditional density 
of galaxies through Eq. 4. On the other hand, we have a 
discrepancy when comparing the directly measured to 
that obtained from the Gumbel flts through Eq. 4. The 
reason for this discrepancy is that the uncertainty in the 
tail of the PDF P{n, r) is amplified when we directly cal- 
culate the second moment. 
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Fig. 4: One of the best fits is obtained for r = 20. The data 
is rescaled by the fitted parameters a and (5. The solid line 
corresponds to the parameter-less Gumbel distribution Eq.3. 
The inset depicts the same on log-linear scale. 
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Fig. 7: The fitting-free data collapse (Eqs. f f-12) based on the 
first two moments of the distribution. Note again the satisfac- 
tory data collapse on all scales. The black line is the Gumbel 
distribution of Eq.(i2), while the blue line is the corresponding 
normal Gauss distribution with zero mean and unit variance. 




Fig. 5: Data curves of different r scaled together by fitting 
parameters a and /? for each curves. The solid line is the 
parameter-free Gumbel distribution Eq.3. 
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Fig. 6: The same as Fig. 5, but on log-linear scale to emphasize 
the tails of the distribution. 



Data collapse without fitting. It is possible to obtain 
a scaling of the data without any fitting procedure. We 
can compute the average, ^, and the standard deviation, 

, of the data and use the scaled variable 



The density functions for different values of r scale to the 
single curve 

$(y) = ae-('^2/+7)-e-("-+^) (12) 

with a = Tr/^/6. (This function, of course, has mean 
zero, and standard deviation one.) This type of fitting- 
free data collapse has been used extensively in statistical 
physics [16, 19]. As shown in Fig. 7 we find a satisfactory 
agreement with Eq.l2. Note also that Gaussian fluctu- 
ations can be clearly excluded. Compared to the fitting 
results of Fig. 5, the agreement in Fig. 7 is better around 
the tails of the distribution, but it gets worse around the 
maximum. The reason for this latter mismatch is again 
due to the uncertainty in the second moment. 

Discussion. — Given the observed scaling and data 
collapse in the spatial galaxy data, is there any supporting 
evidence for the appearance of the Gumbel distribution? 
Due to the scaling and data collapse we argue that the 
large scale galaxy distribution shows similarities with crit- 
ical systems. Here the galaxy density around each galaxy 
is analogous to a random variable describing a spatially 
averaged quantity in a volume. The average conditional 
galaxy density depends on the volume size (~ r^) only 
logarithmically n{r) ~ 1/logr from Eq.7. According to 
the conjecture of Bramwell for critical systems [13], if a 
spatially averaged quantity depends only weakly (say log- 
arithmically) on the system size, the distribution of this 
quantity follows the Gumbel distribution. This is indeed 
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what we see in the galaxy data. Hence our two observa- 
tions about the average density and the density distribu- 
tion are compatible with the behavior of critical systems 
in statistical physics. 

We note that standard models of galaxy formation pre- 
dict homogeneous mass distribution beyond ~ 10 Mpc/h 
[6, 7, 30]. To explain our findings about non-Gaussian 
fluctuations up to much larger scales presents a chal- 
lenge for future theoretical galaxy formation models (see 
[6,7,9,10,30] for more details). 

In summary, we have established scaling and data col- 
lapse over a wide range of radius (volume) in galaxy 
data. Scaling in the data indicates criticality. The average 
galaxy density depends only logarithmically on the radius, 
which suggests a Gumbel scaling function [13]. The scaled 
data is indeed remarkably close to the Gumbel distribu- 
tion, which is one of the three extreme value distributions. 
How this distribution arises through galaxy formation, or 
what the extreme quantity is in the galax;y data, are chal- 
lenging questions needed to be addressed in the future. 
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